Signal modification method for efficient coding of speech signals

ABSTRACT

For determining a long-term-prediction delay parameter characterizing a long term prediction in a technique using signal modification for digitally encoding a sound signal, the sound signal is divided into a series of successive frames, a feature of the sound signal is located in a previous frame, a corresponding feature of the sound signal is located in a current frame, and the long-term-prediction delay parameter is determined for the current frame while mapping, with the long term prediction, the signal feature of the previous frame with the corresponding signal feature of the current frame. In a signal modification method for implementation into a technique for digitally encoding a sound signal, the sound signal is divided into a series of successive frames, each frame of the sound signal is partitioned into a plurality of signal segments, and at least a part of the signal segments of the frame are warped while constraining the warped signal segments inside the frame. For searching pitch pulses in a sound signal, a residual signal is produced by filtering the sound signal through a linear prediction analysis filter, a weighted sound signal is produced by processing the sound signal through a weighting filter, the weighted sound signal being indicative of signal periodicity, a synthesized weighted sound signal is produced by filtering a synthesized speech signal produced during a last subframe of a previous frame of the sound signal through the weighting filter, a last pitch pulse of the sound signal of the previous frame is located from the residual signal, a pitch pulse prototype of given length is extracted around the position of the last pitch pulse of the sound signal of the previous frame using the synthesized weighted sound signal, and the pitch pulses are located in a current frame using the pitch pulse prototype.

FIELD OF THE INVENTION

The present invention relates generally to the encoding and decoding ofsound signals in communication systems. More specifically, the presentinvention is, concerned with a signal modification technique applicableto, in particular but not exclusively, code-excited linear prediction(CELP) coding.

BACKGROUND OF THE INVENTION

Demand for efficient digital narrow- and wideband speech codingtechniques with a good trade-off between the subjective quality and bitrate is increasing in various application areas such asteleconferencing, multimedia, and wireless communications. Untilrecently, the telephone bandwidth constrained into a range of 200-3400Hz has mainly been used in speech coding applications. However, widebandspeech applications provide increased intelligibility and naturalness incommunication compared to the conventional telephone bandwidth. Abandwidth in the range 50-7000 Hz has been found sufficient fordelivering a good quality giving an impression of face-to-facecommunication. For general audio signals, this bandwidth gives anacceptable subjective quality, but is still lower than the quality of FMradio or CD that operate in ranges of 20-16000 Hz and 20-20000 Hz,respectively.

A speech encoder converts a speech signal into a digital bit streamwhich is transmitted over a communication channel or stored in a storagemedium. The speech signal is digitized, that is sampled and quantizedwith usually 16-bits per sample. The speech encoder has the role ofrepresenting these digital samples with a smaller number of bits whilemaintaining a good subjective speech quality. The speech decoder orsynthesizer operates on the transmitted or stored bit stream andconverts it back to a sound signal.

Code-Excited Linear Prediction (CELP) coding is one of the besttechniques for achieving a good compromise between the subjectivequality and bit rate. This coding technique is a basis of several speechcoding standards both in wireless and wire line applications. In CELPcoding, the sampled speech signal is processed in successive blocks of Nsamples usually called frames, where N is a predetermined numbercorresponding typically to 10-30 ms. A linear prediction (LP) filter iscomputed and transmitted every frame. The computation of the LP filtertypically needs a look ahead, i.e. a 5-10 ms speech segment from thesubsequent frame. The N-sample frame is divided into smaller blockscalled subframes. Usually the number of subframes is three or fourresulting in 4-10 ms subframes. In each subframe, an excitation signalis usually obtained from two components: a past excitation and aninnovative, fixed-codebook excitation. The component formed from thepast excitation is often referred to as the adaptive codebook or pitchexcitation. The parameters characterizing the excitation signal arecoded and transmitted to the decoder, where the reconstructed excitationsignal is used as the input of the LP filter.

In conventional CELP coding, long term prediction for mapping the pastexcitation to the present is usually performed on a subframe basis. Longterm prediction is characterized by a delay parameter and a pitch gainthat are usually computed, coded and transmitted to the decoder forevery subframe. At low bit rates, these parameters consume a substantialproportion of the available bit budget. Signal modification techniques[1-7]

-   -   [1] W. B. Kleijn, P. Kroon, and D. Nahumi, “The RCELP        speech-coding algorithm,” European Transactions on        Telecommunications, Vol. 4, No. 5, pp. 573-582, 1994.    -   [2] W. B. Kleijn, R. P. Ramachandran, and P. Kroon,        “Interpolation of the pitch-predictor parameters in        analysis-by-synthesis speech coders,” IEEE Transactions on        Speech and Audio Processing, Vol. 2, No. 1, pp. 42-54, 1994.    -   [3] Y. Gao, A. Benyassine, J. Thyssen, H. Su, and E. Shlomot,        “EX-CELP: A speech coding paradigm,” IEEE International        Conference on Acoustics, Speech and Signal Processing (ICASSP),        Salt Lake City, Utah, U.S.A., pp. 689-692, 7-11 May 2001.    -   [4] U.S. Pat. No. 5,704,003, “RCELP coder,” Lucent Technologies        Inc., (W. B. Kleijn and D. Nahumi), Filing Date: 19 Sep. 1995.    -   [5] European Patent Application 0 602 826 A2, “Time shifting for        analysis-by-synthesis coding,” AT&T Corp., (B. Kleijn), Filing        Date: 1 Dec. 1993.    -   [6] Patent Application WO 00/11653, “Speech encoder with        continuous warping combined with long term prediction,” Conexant        Systems Inc., (Y. Gao), Filing Date: 24 Aug. 1999.    -   [7] Patent Application WO 00/11654, Speech encoder adaptively        applying pitch preprocessing with continuous warping,” Conexant        Systems. Inc., (H. Su and. Y. Gao), Filing Date: 24 Aug. 1999.        improve the performance of long term prediction at low bit rates        by adjusting the signal to be coded. This is done by adapting        the evolution of the pitch cycles in the speech signal to fit        the long term prediction delay, enabling to transmit only one        delay parameter per frame. Signal modification is based on the        premise that it is possible to render the difference between the        modified speech signal and the original speech signal inaudible.        The CELP coders utilizing signal modification are often referred        to as generalized analysis-by-synthesis or relaxed CELP (RCELP)        coders.

Signal modification techniques adjust the pitch of the signal to apredetermined delay contour. Long term prediction then maps the pastexcitation signal to the present subframe using this delay contour andscaling by a gain parameter. The delay contour is obtainedstraightforwardly by interpolating between two open-loop pitchestimates, the first obtained in the previous frame and the second inthe current frame. Interpolation gives a delay value for every timeinstant of the frame. After the delay contour is available, the pitch inthe subframe to be coded currently is adjusted to follow this artificialcontour by warping, i.e. changing the time scale of the signal.

In discontinuous warping [1, 4 and 5]

-   -   [1] W. B. Kleijn, P. Kroon, and D. Nahumi, “The RCELP        speech-coding algorithm,” European Transactions on        Telecommunications, Vol. 4, No. 5, pp. 573-582, 1994.    -   [4] U.S. Pat. No. 5,704,003, “RCELP coder,” Lucent Technologies        Inc., (W. B. Kleijn and D. Nahumi), Filing Date: 19 Sep. 1995.    -   [5] European Patent Application 0 602 826 A2, “Time shifting for        analysis-by-synthesis coding,” AT&T Corp., (B. Kleijn), Filing        Date: 1 Dec. 1993.        a signal segment is shifted in time without altering the segment        length. Discontinuous warping requires a procedure for handling        the resulting overlapping or missing signal portions. Continuous        warping [2, 3, 6, 7]    -   [2] W. B. Kleijn, R. P. Ramachandran, and P. Kroon,        “Interpolation of the pitch-predictor parameters in        analysis-by-synthesis speech coders,” IEEE Transactions on        Speech and Audio Processing, Vol. 2, No. 1, pp. 42-54,1994.    -   [3] Y. Gao, A. Benyassine, J. Thyssen, H. Su, and E. Shlomot,        “EX-CELP: A speech coding paradigm,” IEEE International        Conference on Acoustics, Speech and Signal Processing (ICASSP),        Salt Lake City, Utah, U.S.A., pp. 689-692, 7-11 May 2001.    -   [6] Patent Application WO 00/1 1653, “Speech encoder with        continuous warping combined with long term prediction,” Conexant        Systems Inc., (Y. Gao), Filing Date: 24 Aug. 1999.    -   [7] Patent Application WO 00/11654, “Speech encoder adaptively        applying pitch preprocessing with continuous warping,” Conexant        Systems Inc., (H. Su and Y. Gao), Filing Date 24 Aug. 1999.        either contracts or expands a signal segment. This is done using        a time continuous approximation for the signal segment and        re-sampling it to a desired length with unequal sampling        intervals determined based on the delay contour. For reducing        artifacts in these operations, the tolerated change in the time        scale is kept small. Moreover, warping is typically done using        the LP residual signal or the weighted speech signal to reduce        the resulting distortions. The use of these signals instead of        the speech signal also facilitates detection of pitch pulses and        low-power regions in between them, and thus the determination of        the signal segments for warping. The actual modified speech        signal is generated by inverse filtering.

After the signal modification is done for the current subframe, thecoding can proceed in any conventional manner except the adaptivecodebook excitation is generated using the predetermined delay contour.Essentially the same signal modification techniques can be used both innarrow- and wideband CELP coding.

Signal modification techniques can also be applied in other types ofspeech coding methods such as waveform interpolation coding andsinusoidal coding for instance in accordance with [8].

-   -   [8] U.S. Pat. No. 6,223,151, “Method and apparatus for        pre-processing speech signals prior to coding by transform-based        speech coders,” Telefon Aktie Bolaget L M Ericsson, (W. B.        Kleijn. and T. Eriksson), Filing Date 10 Feb. 1999.

SUMMARY OF THE INVENTION

The present invention relates to a method for determining along-term-prediction delay parameter characterizing a long termprediction in a technique using signal modification for digitallyencoding a sound signal, comprising dividing the sound signal into aseries of successive frames, locating a feature of the sound signal in aprevious frame, locating a corresponding feature of the sound signal ina current frame, and determining the long-term-prediction delayparameter for the current frame such that the long term prediction mapsthe signal feature of the previous frame to the corresponding signalfeature of the current frame.

The subject invention Is concerned with a device for determining along-term-prediction delay parameter characterizing a long termprediction in a technique using signal modification for digitallyencoding a sound signal, comprising a divider of the sound signal into aseries of successive frames, a detector of a feature of the sound signalin a previous frame, a detector of a corresponding feature of the soundsignal in a current frame, and a calculator of the long-term-predictiondelay parameter for the current frame, the calculation of thelong-term-prediction delay parameter being made such that the long termprediction maps the signal feature of the previous frame to thecorresponding signal feature of the current frame.

According to the invention, there is provided a signal modificationmethod for implementation into a technique for digitally encoding asound signal, comprising dividing the sound signal into a series ofsuccessive frames, partitioning each frame of the sound signal into aplurality of signal segments, and warping at least a part of the signalsegments of the frame, this warping comprising constraining the warpedsignal segments inside the frame.

In accordance with the present invention, there is provided a signalmodification device for implementation into a technique for digitallyencoding a sound signal, comprising a first divider of the sound signalinto a series of successive frames, a second divider of each frame ofthe sound signal into a plurality of signal segments, and a signalsegment warping member supplied with at least a part of the signalsegments of the frame, this warping member comprising a constrainer ofthe warped signal segments inside the frame.

The present invention also relates to a method for searching pitchpulses in a sound signal, comprising dividing the sound signal into aseries of successive frames, dividing each frame into a number ofsubframes, producing a residual signal by filtering the sound signalthrough a linear prediction analysis filter, locating a last pitch pulseof the sound signal of the previous frame from the residual signal,extracting a pitch pulse prototype of given length around the positionof the last pitch pulse of the previous frame using the residual signal,and locating pitch pulses in a current frame using the pitch pulseprototype.

The present invention is also concerned with a device for searchingpitch pulses in a sound signal, comprising a divider of the sound signalinto a series of successive frames, a divider of each frame into anumber of subframes, a linear prediction analysis filter for filteringthe sound signal and thereby producing a residual signal, a detector ofa last pitch pulse of the sound signal of the previous frame in responseto the residual signal, an extractor of a pitch pulse prototype of givenlength around the position of the last pitch pulse of the previous framein response to the residual signal, and a detector of pitch pulses in acurrent frame using the pitch pulse prototype.

According to the invention, there is also provided a method forsearching pitch pulses in a sound signal, comprising dividing the soundsignal into a series of successive frames, dividing each frame into anumber of subframes, producing a weighted sound signal by processing thesound signal through a weighting filter wherein the weighted soundsignal is indicative of signal periodicity, locating a last pitch pulseof the sound signal of the previous frame from the weighted soundsignal, extracting a pitch pulse prototype of given length around theposition of the last pitch pulse of the previous frame using theweighted sound signal, and locating pitch pulses in a current frameusing the pitch pulse prototype.

Also in accordance with the present invention, there is provided adevice for searching pitch pulses in a sound signal, comprising adivider of the sound signal into a series of successive frames, adivider of each frame into a number of subframes, a weighting filter forprocessing the sound signal to produce a weighted sound signal whereinthe weighted sound signal is indicative of signal periodicity, adetector of a last pitch pulse of the sound signal of the previous framein response to the weighted sound signal, an extractor of a pitch pulseprototype of given length around the position of the last pitch pulse ofthe previous frame in response to the weighted sound signal, and adetector of pitch pulses in a current frame using the pitch pulseprototype.

The present invention further relates to a method for searching pitchpulses in a sound signal, comprising dividing the sound signal into aseries of successive frames, dividing each frame into a number ofsubframes, producing a synthesized weighted sound signal by filtering asynthesized speech signal produced during a last subframe of a previousframe of the sound signal through a weighting filter, locating a lastpitch pulse of the sound signal of the previous frame from thesynthesized weighted sound signal, extracting a pitch pulse prototype ofgiven length around the position of the last pitch pulse of the previousframe using the synthesized weighted sound signal, and locating pitchpulses in a current frame using the pitch pulse prototype.

The present invention is further concerned with a device for searchingpitch pulses in a sound signal, comprising a divider of the sound signalinto a series of successive frames, a divider of each frame into anumber of subframes, a weighting filter for filtering a synthesizedspeech signal produced during a last subframe of a previous frame of thesound signal and thereby producing a synthesized weighted sound signal,a detector of a last pitch pulse of the sound signal of the previousframe in response to the synthesized weighted sound signal, an extractorof a pitch pulse prototype of given length around the position of thelast pitch pulse of the previous frame in response to the synthesizedweighted sound signal, and a detector of pitch pulses in a current frameusing the pitch pulse prototype.

According to the invention, there is further provided a method forforming an adaptive codebook excitation during decoding of a soundsignal divided into successive frames and previously encoded by means ofa technique using signal modification for digitally encoding the soundsignal, comprising:

-   -   receiving, for each frame, a long-term-prediction delay        parameter characterizing a long term prediction in the digital        sound signal encoding technique;    -   recovering a delay contour using the long-term-prediction delay        parameter received during a current frame and the        long-term-prediction delay parameter received during a previous        frame, wherein the delay contour, with long term prediction,        maps a signal feature of the previous frame to a corresponding        signal feature of the current frame;    -   forming the adaptive codebook excitation in an adaptive codebook        in response to the delay contour.

Further in accordance with the present invention, there is provided adevice for forming an adaptive codebook excitation during decoding of asound signal divided into successive frames and previously encoded bymeans of a technique using signal modification for digitally encodingthe sound signal, comprising:

-   -   a receiver of a long-term-prediction delay parameter of each        frame, wherein the long-term-prediction delay parameter        characterizes a long term prediction in the digital sound signal        encoding technique;    -   a calculator of a delay contour in response to the        long-term-prediction delay parameter received during a current        frame and the long-term-prediction delay parameter received        during a previous frame, wherein the delay contour, with long        term prediction, maps a signal feature of the previous frame to        a corresponding signal feature of the current frame; and    -   an adaptive codebook for forming the adaptive codebook        excitation in response to the delay contour.

The foregoing and other objects, advantages and features of the presentinvention will become more apparent upon reading of the following nonrestrictive description of illustrative embodiments thereof, given byway of example only with reference to the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is an illustrative example of original and modified residualsignals for one frame;

FIG. 2 is a functional block diagram of an illustrative embodiment of asignal modification method according to the invention;

FIG. 3 is a schematic block diagram of an illustrative example of speechcommunication system showing the use of speech encoder and decoder;

FIG. 4 is a schematic block diagram of an illustrative embodiment ofspeech encoder that utilizes a signal modification method;

FIG. 5 is a functional block diagram of an illustrative embodiment ofpitch pulse search;

FIG. 6 is an illustrative example of located pitch pulse positions and acorresponding pitch cycle segmentation for one frame;

FIG. 7 is an illustrative example on determining a delay parameter whenthe number of pitch pulses is three (c=3);

FIG. 8 is an illustrative example of delay interpolation (thick line)over a speech frame compared to linear interpolation (thin line);

FIG. 9 is an illustrative example of a delay contour over ten framesselected in accordance with the delay interpolation (thick line) of FIG.8 and linear interpolation (thin line) when the correct pitch value is52 samples;

FIG. 10 is a functional block diagram of the signal modification methodthat adjusts the speech frame to the selected delay contour inaccordance with an illustrative embodiment of the present invention;

FIG. 11 is an illustrative example on updating the target signal {tildeover (ω)}(t) using a determined optimal shift a, and on replacing thesignal segment w_(s)(k) with interpolated values shown as gray dots;

FIG. 12 is a functional block diagram of a rate determination logic inaccordance with an illustrative embodiment of the present invention; and

FIG. 13 is a schematic block diagram of an illustrative embodiment ofspeech decoder that utilizes the delay contour formed in accordance withan illustrative embodiment of the present invention.

DETAILED DESCRIPTION OF THE ILLUSTRATIVE EMBODIMENTS

Although the illustrative embodiments of the present invention will bedescribed in relation to speech signals and the 3GPP AMR Wideband SpeechCodec AMR-WB Standard (ITU-T G.722.2), it should be kept in mind thatthe concepts of the present invention may be applied to other types ofsound signals as well as other speech and audio coders.

FIG. 1 illustrates an example of modified residual signal 12 within oneframe. As shown in FIG. 1, the time shift in the modified residualsignal 12 is constrained such that this modified residual signal is timesynchronous with the original, unmodified residual signal 11 at frameboundaries occurring at time instants t_(n−1) and t_(n). Here n refersto the index of the present frame.

More specifically, the time shift is controlled implicitly with a delaycontour employed for interpolating the delay parameter over the currentframe. The delay parameter and contour are determined considering thetime alignment constrains at the above-mentioned frame boundaries. Whenlinear interpolation is used to force the time alignment, the resultingdelay parameters tend to oscillate over several frames. This oftencauses annoying artifacts to the modified signal whose pitch follows theartificial oscillating delay contour. Use of a properly chosen nonlinearinterpolation technique for the delay parameter will substantiallyreduce these oscillations.

A functional block diagram of the illustrative embodiment of the signalmodification method according to the invention is presented in FIG. 2.

The method starts, in “pitch cycle search” block 101, by locatingindividual pitch pulses and pitch cycles. The search of block 101utilizes an open-loop pitch estimate interpolated over the frame. Basedon the located pitch pulses, the frame is divided into pitch cyclesegments, each containing one pitch pulse and restricted inside theframe boundaries t_(n−1) and t_(n).

The function of the “delay curve selection” block 103 is to determine adelay parameter for the long term predictor and form a delay contour forinterpolating this delay parameter over the frame. The delay parameterand contour are determined considering the time synchrony constrains atframe boundaries t_(n−1) and t_(n). The delay parameter determined inblock 103 is coded and transmitted to the decoder when signalmodification is enabled for the current frame.

The actual signal modification procedure is conducted in the “pitchsynchronous signal modification” block 105. Block 105 first forms atarget signal based on the delay contour determined in block 103 forsubsequently matching the individual pitch cycle segments into thistarget signal. The pitch cycle segments are then shifted one by one tomaximize their correlation with this target signal. To keep thecomplexity at a low level, no continuous time warping is applied whilesearching the optimal shift and shifting the segments.

The illustrative embodiment of signal modification method as disclosedin the present specification is typically enabled only on purely voicedspeech frames. For instance, transition frames such as voiced onsets arenot modified because of a high risk of causing artifacts. In purelyvoiced frames, pitch cycles usually change relatively slowly andtherefore small shifts suffice to adapt the signal to the long termprediction model. Because only small, cautious signal adjustments aremade, the probability of causing artifacts is minimized.

The signal modification method constitutes an efficient classifier forpurely voiced segments, and hence a rate determination mechanism to beused in a source-controlled coding of speech signals. Every block 101,103 and 105 of FIG. 2 provide several indicators on signal periodicityand the suitability of signal modification in the current frame. TheseIndicators are analyzed in logic blocks 102, 104 and 106 in order todetermine a proper coding mode and bit rate for the current frame. Morespecifically, these logic blocks 102, 104 and 106 monitor the success ofthe operations conducted in blocks 101, 103, and 105.

If block 102 detects that the operation performed in block 101 issuccessful, the signal modification method is continued in block 103.When this block 102 detects a failure in the operation performed inblock 101, the signal modification procedure is terminated and theoriginal speech frame is preserved intact for coding (see block 108corresponding to normal mode (no signal modification)).

If block 104 detects that the operation performed in block 103 issuccessful, the signal modification method is continued in block 105.When, on the contrary, this block 104 detects a failure in the operationperformed in block 103, the signal modification procedure is terminatedand the original speech frame is preserved intact for coding (see block108 corresponding to normal mode (no signal modification)).

If block 106 detects that the operation performed in block 105 issuccessful, a low bit rate modek with signal modification is used (seeblock 107). On the contrary, when this block 106 detects a failure inthe operation performed in block 105 the signal modification procedureis terminated, and the original speech frame is preserved intact forcoding (see block 108 corresponding to normal mode (no signalmodification)). The operation of the blocks 101-108 will be described indetail later in the present specification.

FIG. 3 is a schematic block diagram of an illustrative example of speechcommunication system depicting the use of speech encoder and decoder.The speech communication system of FIG. 3 supports transmission andreproduction of a speech signal across a communication channel 205.Although it may comprise for example a wire, an optical link or a fiberlink, the communication channel 205 typically comprises at least in parta radio frequency link. The radio frequency link often supportsmultiple, simultaneous speech communications requiring shared bandwidthresources such as may be found with cellular telephony. Although notshown, the communication channel 205 may be replaced by a storage devicethat records and stores the encoded speech signal for later playback.

On the transmitter side, a microphone 201 produces an analog speechsignal 210 that is supplied to an analog-to-digital (A/D) converter 202.The function of the AND converter 202 is to convert the analog speechsignal 210 into a digital speech signal 211. A speech encoder 203encodes the digital speech signal 211 to produce a set of codingparameters 212 that are coded into binary form and delivered to achannel encoder 204. The channel encoder 204 adds redundancy to thebinary representation of the coding parameters before transmitting theminto a bitstream 213. over the communication channel 205.

On the receiver side, a channel decoder 206 is supplied with the abovementioned redundant binary representation of the coding parameters fromthe received bitstream 214 to detect and correct channel errors thatoccurred in the transmission. A speech decoder 207 converts thechannel-error-corrected bitstream 215 from the channel decoder 206 backto a set of coding parameters for creating a synthesized digital speechsignal 216. The synthesized speech signal 216 reconstructed by thespeech decoder 207 is converted to an analog speech signal 217 through adigital-to-analog (D/A) converter 208 and played back through aloudspeaker unit 209.

FIG. 4 is a schematic block diagram showing the operations performed bythe illustrative embodiment of speech encoder 203 (FIG. 3) incorporatingthe signal modification functionality. The present specificationpresents a novel implementation of this signal modificationfunctionality of block 603 in FIG. 4. The other operations performed bythe speech encoder 203 are well known to those of ordinary skill in theart and have been described, for example, in the publication [10]

-   -   [10] 3GPP TS 26.190, “AMR Wideband Speech Codec: Transcoding        Functions,” 3GPP Technical Specification.        which is incorporated herein by reference. When not stated        otherwise, the implementation of the speech encoding and        decoding operations in the illustrative embodiments and examples        of the present invention will comply with the AMR Wideband        Speech Codec (AMR-WB) Standard.

The speech encoder 203 as shown in FIG. 4 encodes the digitized speechsignal using one or a plurality of coding modes. When a plurality ofcoding modes are used and the signal modification functionality isdisabled in one of these modes, this particular mode will operate inaccordance with well established standards known to those of ordinaryskill in the art.

Although not shown in FIG. 4, the speech signal is sampled at a rate of16 kHz and each speech signal sample is digitized. The digital speechsignal is then divided into successive frames of given length, and eachof these frames is divided into a given number of successive subframes.The digital speech signal is further subjected to preprocessing astaught by the AMR-WB standard. This preprocessing includes high-passfiltering, pre-emphasis filtering using a filter P(z)=1−0.68z⁻¹ anddown-sampling from the sampling rate of 16 kHz to 12.8 kHz. Thesubsequent operations of FIG. 4 assume that the input speech signal s(t)has been preprocessed and down-sampled to the sampling rate of 12.8 kHz.

The speech encoder 203 comprises an LP (Linear Prediction) analysis andquantization module 601 responsive to the input, preprocessed digitalspeech signal s(t) 617 to compute and quantize the parameters a₀, a₁,a₂, . . . , a_(A) of the LP filter 1/A(z), wherein n_(A) is the order ofthe filter and A(z)=a₀+a₁z⁻¹+a₂z⁻²+ . . . +a_(nA)z^(−nA) . The binaryrepresentation 616 of these quantized LP filter parameters is suppliedto the multiplexer 614 and subsequently multiplexed into the bitstream615. The non-quantized and quantized LP filter parameters can beinterpolated for obtaining the corresponding LP filter parameters forevery subframe.

The speech encoder 203 further comprises a pitch estimator 602 tocompute open-loop pitch estimates 619 for the current frame in responseto the LP filter parameters 618 from the LP analysis and quantizationmodule 601. These open-loop pitch estimates 619 are interpolated overthe frame to be used in a signal modification module 603.

The operations performed in the LP analysis and quantization module 601and the pitch estimator 602 can be implemented in compliance with theabove-mentioned AMR-WB Standard.

The signal modification module 603 of FIG. 4 performs a signalmodification operation prior to the closed-loop pitch search of theadaptive codebook excitation signal for adjusting the speech signal tothe determined delay contour d(t). In the illustrative embodiment, thedelay contour d(t) defines a long term prediction delay for every sampleof the frame. By construction the delay contour is fully characterizedover the frame tε(t_(n−1), t_(n).] by a delay parameter 620d_(n)=d(t_(n)) and its previous value d_(n−1)=d(t_(n−1)) that are equalto the value of the delay contour at frame boundaries. The delayparameter 620 is determined as a part of the signal modificationoperation, and coded and then supplied to the multiplexer 614 where itis multiplexed into the bitstream 615.

The delay contour d(t) defining a long term prediction delay parameterfor every sample of the frame is supplied to an adaptive codebook 607.The adaptive codebook 607 is responsive to the delay contour d(t) toform the adaptive codebook excitation u_(b)(t) of the current subframefrom the excitation u(t) using the delay contour d(t) asu_(b)(t)=u(t−d(t)). Thus the the delay contour maps the past sample ofthe exitation signal u(t−d(t)) to the present sample in the adaptivecodebook excitation u_(b)(t).

The signal modification procedure produces also a modified residualsignal {haeck over (r)}(t) to be used for composing a modified targetsignal 621 for the closed-loop search of the fixed-codebook excitationu_(c)(t). The modified residual signal {haeck over (r)}(t) is obtainedin the signal modification module 603 by warping the pitch cyclesegments of the LP residual signal, and is supplied to the computationof the modified target signal in module 604. The LP synthesis filteringof the modified residual signal with the filter 1/A(z) yields then inmodule 604 the modified speech signal. The modified target signal 621 ofthe fixed-codebook excitation search is formed in module 604 inaccordance with the operation of the AMR-WB Standard, but with theoriginal speech signal replaced by its modified version.

After the adaptive codebook excitation u_(b)(t) and the modified targetsignal 621 have been obtained for the current subframe, the encoding canfurther proceed using conventional means.

The function of the closed-loop fixed-codebook excitation search is todetermine the fixed-codebook excitation signal u_(c)(t) for the currentsubframe. To schematically illustrate the operation of the closed-loopfixed-codebook search, the fixed-codebook excitation u_(c)(t) is gainscaled through an amplifier 610. In the same manner, theadaptive-codebook excitation u_(b)(t) is gain scaled through anamplifier 609. The gain scaled adaptive and fixed-codebook excitationsu_(b)(t) and u_(c)(t) are summed together through an adder 611 to form atotal excitation signal u(t). This total excitation signal u(t) isprocessed through an LP synthesis filter 1/A(z) 612 to produce asynthesis speech signal 625 which is subtracted from the modified targetsignal 621 through an adder 605 to produce an error signal 626. An errorweighting and minimization module 606 is responsive to the error signal626 to calculate, according to conventional methods, the gain parametersfor the amplifiers 609 and 610 every subframe. The error weighting andminimization module 606 further calculates, in accordance withconventional methods and in response to the error signal 626, the input627 to the fixed codebook 608. The quantized gain parameters 622 and 623and the parameters 624 characterizing the fixed-codebook excitationsignal u_(c)(t) are supplied to the multiplexer 614 and multiplexed Intothe bitstream 615. The above procedure is done in the same manner bothwhen signal modification is enabled or disabled.

It should be noted that, when the signal modification functionality isdisabled, the adaptive excitation codebook 607 operates according toconventional methods. In this case, a separate delay parameter issearched for every subframe in the adaptive codebook 607 to refine theopen-loop pitch estimates 619. These delay parameters are coded,supplied to the multiplexer 614 and multiplexed into the bitstream 615.Furthermore, the target signal 621 for the fixed-codebook search isformed in accordance with conventional methods.

The speech decoder as shown in FIG. 13 operates according toconventional methods except when signal modification is enabled. Signalmodification disabled and enabled operation differs essentially only inthe way the adaptive codebook excitation signal u_(b)(t) is formed. Inboth operational modes, the decoder decodes the received parameters fromtheir binary representation. Typically the received parameters includeexcitation, gain, delay and LP parameters. The decoded excitationparameters are used in module 701 to form the fixed-codebook excitationsignal u_(c)(t) for every subframe. This signal is supplied through anamplifier 702 to an adder 703. Similarly, the adaptive codebookexcitation signal u_(b)(t) of the current subframe is supplied to theadder 703 through an amplifier 704. In the adder 703, the gain-scaledadaptive and fixed-codebook excitation signals u_(b)(t) and u_(c)(t) aresummed together to form a total excitation signal u(t) for the currentsubframe. This excitation signal u(t) is processed through the LPsynthesis filter 1/A(z) 708, that uses LP parameters interpolated inmodule 707 for the current subframe, to produce the synthesized speechsignal ŝ(t).

When signal modification is enabled, the speech decoder recovers thedelay contour d(t) In module 705 using the received delay parameterd_(n) and its previous received value d_(n−1) as in the encoder. Thisdelay contour d(t) defines a long term prediction delay parameter forevery time instant of the current frame. The adaptive codebookexcitation u_(b)(t)=u(t−d(t)) is formed from the past excitation for thecurrent subframe as in the encoder using the delay contour d(t).

The remaining description discloses the detailed operation of the signalmodification procedure 603 as well as its use as a part of the modedetermination mechanism.

Search of Pitch Pulses and Pitch Cycle Segments

The signal modification method operates pitch and frame synchronously,shifting each detected pitch cycle segment individually but constrainingthe shift at frame boundaries. This requires means for locating pitchpulses and corresponding pitch cycle segments for the current frame. Inthe illustrative embodiment of the signal modification method, pitchcycle segments are determined based on detected pitch pulses that aresearched according to FIG. 5.

Pitch pulse search can operate on the residual signal r(t), the weightedspeech signal w(t) and/or the weighted synthesized speech signal{circumflex over (ω)}(t). The residual signal r(t) is obtained byfiltering the speech signal s(t) with the LP filter A(z), which has beeninterpolated for the subframes. In the illustrative embodiment, theorder of the LP filter A(z) is 16. The weighted speech signal w(t) isobtained by processing the speech signal s(t) through the weightingfilter $\begin{matrix}{{{W(z)} = \frac{A\left( {z/\gamma_{1}} \right)}{1 - {\gamma_{2}z^{- 1}}}},} & (1)\end{matrix}$where the coefficients γ₁=0.92 and γ₂=0.68. The weighted speech signalw(t) is often utilized in open-loop pitch estimation (module 602) sincethe weighting filter defined by Equation (1) attenuates the formantstructure in the speech signal s(t), and preserves the periodicity alsoon sinusoidal signal segments. That facilitates pitch pulse searchbecause possible signal periodicity becomes clearly apparent in weightedsignals. It should be noted that the weighted speech signal w(t) isneeded also for the look ahead in order to search the last pitch pulsein the current frame. This can be done by using the weighting filter ofEquation (1) formed in the last subframe of the current frame over thelook ahead portion.

The pitch pulse search procedure of FIG. 5 starts in block 301 bylocating the last pitch pulse of the previous frame from the residualsignal r(t). A pitch pulse typically stands out clearly as the maximumabsolute value of the low-pass filtered residual signal in a pitch cyclehaving a length of approximately p(t_(n−1)). A normalized Hamming windowH₅(z)=(0.08z⁻²+0.54 z⁻¹+1+0.54 z+0.08 z²)/2.24 having a length of five(5) samples is used for the low-pass filtering in order to facilitatethe locating of the last pitch pulse of the previous frame. This pitchpulse position is denoted by T₀. The illustrative embodiment of thesignal modification method according to the invention does not requirean accurate position for this pitch pulse, but rather a rough locationestimate of the high-energy segment in the pitch cycle.

After locating the last pitch pulse at T₀ in the previous frame, a pitchpulse prototype of length 2/+1 samples is extracted in block 302 of FIG.5 around this rough position estimate as, for example:m _(n)(k)={circumflex over (ω)}(T ₀ −l+k) for k=0, 1, . . . , 2l.   (2)This pitch pulse prototype is subsequently used in locating pitch pulsesin the current frame.

The synthesized weighted speech signal {circumflex over (ω)}(t) (or theweighted speech signal ω(t)) can be used for the pulse prototype insteadof the residual signal r(t). This facilitates pitch pulse search,because the periodic structure of the signal is better preserved in theweighted speech signal. The synthesized weighted speech signal{circumflex over (ω)}(t) is obtained by filtering the synthesized speechsignal ŝ(t) of the last subframe of the previous frame by the weightingfilter W(z) of Equation (1). If the pitch pulse prototype extends overthe end of the previously synthesized frame, the weighted speech signalw(t) of the current frame is used for this exceeding portion. The pitchpulse prototype has a high correlation with the pitch pulses of theweighted speech signal w(t) if the previous synthesized speech framecontains already a well-developed pitch cycle. Thus the use of thesynthesized speech in extracting the prototype provides additionalinformation for monitoring the performance of coding and selecting anappropriate coding mode in the current frame as will be explained inmore detail in the following description.

Selecting I=10 samples provides a good compromise between the complexityand performance in the pitch pulse search. The value of I can also bedetermined proportionally to the open-loop pitch estimate.

Given the position T₀ of the last pulse in the previous frame, the firstpitch pulse of the current frame can be predicted to occur approximatelyat instant T₀+p(T₀). Here p(t) denotes the interpolated open-loop pitchestimate at instant (position) t. This prediction is performed in block303.

In block 305, the predicted pitch pulse position T₀+p(T₀) is refined asT ₁ =T ₀ +p(T ₀)+arg max C(j),   (3)where the weighted speech signal w(t) in the neighborhood of thepredicted position is correlated with the pulse prototype:$\begin{matrix}{{{C(j)} = {{\gamma(j)}{\sum\limits_{k = 0}^{2l}{{m_{n}(k)}{w\left( {T_{0} + {p\left( T_{0} \right)} + j - l + k} \right)}}}}},{j \in {\left\lbrack {{- j_{\max}},j_{\max}} \right\rbrack.}}} & (4)\end{matrix}$Thus the refinement is the argument j, limited into [−j_(max), j_(max)],that maximizes the weighted correlation C(j) between the pulse prototypeand one of the above mentioned residual signal, weighted speech signalor weighted synthesized speech signal. According to an illustrativeexample, the limit j_(max) is proportional to the open-loop pitchestimate as min{20,<p(0)/4>}, where the operator <•> denotes rounding tothe nearest integer. The weighting functionγ(j)=1−|j|/p(T ₀ +p(T ₀))   (5)in Equation (4) favors the pulse position predicted using the open-looppitch estimate, since γ(j) attains its maximum value 1 at j=0. Thedenominator p(T₀+p(T₀)) in Equation (5) is the open-loop pitch estimatefor the predicted pitch pulse position.

After the first pitch pulse position T₁ has been found using Equation(3), the next pitch pulse can be predicted to be at instant T₂=T₁+p(T₁)and refined as described above. This pitch pulse search comprising theprediction 303 and refinement 305 is repeated until either theprediction or refinement procedure yields a pitch pulse position outsidethe current frame. These conditions are checked in logic block 304 forthe prediction of the position of the next pitch pulse (block 303) andin logic block 306 for the refinement of this position of the pitchpulse (block 305). It should be noted that the logic block 304terminates the search only if a predicted pulse position is so far inthe subsequent frame that the refinement step cannot bring it back tothe current frame. This procedure yields c pitch pulse positions insidethe current frame, denoted by T₁, T₂, . . . , T_(c).

According to an illustrative example, pitch pulses are located in theinteger resolution except the last pitch pulse of the frame denoted byT_(c). Since the exact distance between the last pulses of twosuccessive frames is needed to determine the delay parameter to betransmitted, the last pulse is located using a fractional resolution of¼ sample in Equation (4) for j. The fractional resolution is obtained byupsampling w(t) in the neighborhood of the last predicted pitch pulsebefore evaluating the correlation of Equation (4). According to anillustrative example, Hamming-windowed sinc interpolation of length 33is used for upsampling. The fractional resolution of the last pitchpulse position helps to maintain the good performance of long termprediction despite the time synchrony constrain set to the frame end.This is obtained with a cost of the additional bit rate needed fortransmitting the delay parameter in a higher accuracy.

After completing pitch cycle segmentation in the current frame, anoptimal shift for each segment is determined. This operation is doneusing the weighted speech signal w(t) as will be explained in thefollowing description. For reducing the distortion caused by warping,the shifts of individual pitch cycle segments are implemented using theLP residual signal r(t). Since shifting distorts the signal particularlyaround segment boundaries, it is essential to place the boundaries inlow power sections of the residual signal r(t). In an illustrativeexample, the segment boundaries are placed approximately in the middleof two consecutive pitch pulses, but constrained inside the currentframe. Segment boundaries are always selected inside the current framesuch that each segment contains exactly one pitch pulse. Segments withmore than one pitch pulse or “empty” segments without any pitch pulseshamper subsequent correlation-based matching with the target signal andshould be prevented in pitch cycle segmentation. The s^(th) extractedsegment of l_(s) samples is denoted as w_(s)(k) for k=0, 1, . . . ,l_(s)−1. The starting instant of this segment is t_(s), selected suchthat w_(s)(Q)=w(t_(s)). The number of segments in the present frame isdenoted by c.

While selecting the segment boundary between two successive pitch pulsesT_(s) and T_(s+1) inside the current frame, the following procedure isused. First the central instant between two pulses is computed asΛ=<(T_(s)+T_(s+1))/2). The candidate positions for the segment boundaryare located in the region (Λ−ε_(max), Λ+ε_(max)], where ε_(max)corresponds to five samples. The energy of each candidate boundaryposition is computed asQ(ε¹)=r ²(Λ+ε¹−1)+r ²(Λ+ε¹), ε¹ε[−ε_(max), ε_(max)].   (6)

The position giving the smallest energy is selected because this choicetypically results in the smallest distortion in the modified speechsignal. The instant that minimizes Equation (6) is denoted as ε. Thestarting instant of the new segment is selected as t_(s)=Λ+ε. Thisdefines also the length of the previous segment, since the previoussegment ends at instant Λ+ε−1.

FIG. 6 shows an illustrative example of pitch cycle segmentation. Noteparticularly the first and the last segment w₁(k) and w₄(k),respectively, extracted such that no empty segments result and the frameboundaries are not exceeded.

Determination of the Delay Parameter

Generally the main advantage of signal modification is that only onedelay parameter per frame has to be coded and transmitted to the decoder(not shown). However, special attention has to be paid to thedetermination of this single parameter. The delay parameter not onlydefines together with its previous value the evolution of the pitchcycle length over the frame, but also affects time asynchrony in theresulting modified signal.

In the methods described in [1, 4-7]

-   -   [1] W. B. Kleijnl P. Kroon, and D. Nahumi, “The RCELP        speech-coding algorithm,” European Transactions on        Telecommunications, Vol. 4, No. 5, pp. 573-582, 1994.    -   [4] U.S. Pat. No. 5,704,003, “RCELP coder,” Lucent Technologies        Inc., (W. B. Kleijn and D. Nahumi), Filing Date 19 Sep. 1995.    -   [5] European Patent Application 0 602 826 A2, “Time shifting for        analysis-by-synthesis coding,” AT&T Corp., (B. Kleijn), Filing        Date 1 Dec. 1993.    -   [6] Patent Application WO 00/11653, “Speech encoder with        continuous warping combined with long term prediction,” Conexant        Systems Inc., (Y. Gao), Filing Date 24 Aug. 1999.    -   [7] Patent Application WO 00/11 654, “Speech encoder adaptively        applying pitch preprocessing with continuous warping,” Conexant        Systems Inc., (H. Su and Y. Gao), Filing Date 24 Aug. 1999.        no time synchrony is required at frame boundaries, and thus the        delay parameter to be transmitted can be determined        straightforwardly using an open-loop pitch estimate. This        selection usually results in a time asynchrony at the frame        boundary, and translates to an accumulating time shift in the        subsequent frame because the signal continuity has to be        preserved. Although human hearing is insensitive to changes in        the time scale of the synthesized speech signal, increasing time        asynchrony complicates the encoder implementation. Indeed, long        signal buffers are required to accommodate the signals whose        time scale may have been expanded, and a control logic has to be        implemented for limiting the accumulated shift during encoding.        Also, time asynchrony of several samples typical in RCELP coding        may cause mismatch between the LP parameters and the modified        residual signal. This mismatch may result in perceptual        artifacts to the modified speech signal that is synthesized by        LP filtering the modified residual signal.

On the contrary, the illustrative embodiment of the signal modificationmethod according to the present invention preserves the time synchronyat frame boundaries. Thus, a strictly constrained shift occurs at theframe ends and every new frame starts in perfect time match with theoriginal speech frame.

To ensure time synchrony at the frame end, the delay contour d(t) maps,with the long term prediction, the last pitch pulse at the end of theprevious synthesized speech frame to the pitch pulses of the currentframe. The delay contour defines an interpolated long-term predictiondelay parameter over the current n^(th) frame for every sample frominstant t_(n−1)+1 through t_(n). Only the delay parameter d_(n)=d(t_(n))at the frame end is transmitted to the decoder implying that d(t) musthave a form fully specified by the transmitted values. The long-termprediction delay parameter has to be selected such that the resultingdelay contour fulfils the pulse mapping. In a mathematical form thismapping can be presented as follows: Let κ_(c) be a temporary timevariable and T₀ and T_(c) the last pitch pulse positions in the previousand current frames, respectively. Now, the delay parameter d_(n) has tobe selected such that, after executing the pseudo-code presented inTable 1, the variable κ_(c) has a value very close to T₀ minimizing theerror |κ_(c)−T₀|. The pseudo-code starts from the value κ₀=T_(c) anditerates backwards c times by updating κ_(j):=κ_(j−1)−d(κ_(j−1)). Ifκ_(c) then equals to T₀, long term prediction can be utilized withmaximum efficiency without time asynchrony at the frame end. TABLE 1Loop for searching the optimal delay parameter. % initialization κ₀ :=T_(c); % loop for i = 1 to c κ_(i) := κ_(i−1) − d(κ_(i−1));− end;

An example of the operation of the delay selection loop in the case c=3is illustrated in FIG. 7. The loop starts from the value κ₀=T_(c) andtakes the first iteration backwards as κ₁=κ₀−d(κ₀). Iterations arecontinued twice more resulting in κ₂=κ₁−d(κ₁) and κ₃=κ₂−d(κ₂). The finalvalue κ₃ is then compared against T₀ in terms of the errore_(n)=|κ₃−T₀|. The resulting error is a function of the delay contourthat is adjusted in the delay selection algorithm as will be taughtlater in this specification.

Signal modification methods [1, 4, 6, 7] such as described in thefollowing documents:

-   -   [1] W. B. Kleijn, P. Kroon, and D. Nahumi, “The RCELP        speech-coding algorithm,” European Transactions on        Telecommunications, Vol. 4, No. 5, pp. 573-582, 1994.    -   [4] U.S. Pat. No. 5,704,003, “RCELP coder,” Lucent Technologies        Inc., (W. B. Kleijn and D. Nahumi), Filing Date 19 Sep. 1995.    -   [6] Patent Application WO 00/11653, “Speech encoder with        continuous warping combined with long term prediction,” Conexant        Systems Inc., (Y. Gao), Filing Date 24 Aug. 1999.    -   [7] Patent Application WO 00/11654, “Speech encoder adaptively        applying pitch preprocessing with continuous warping,” Conexant        Systems Inc., (H. Su and Y. Gao), Filing Date 24 Aug. 1999,        interpolate the delay parameters linearly over the frame between        d_(n−1) and d_(n). However, when time synchrony is required at        the frame end, linear interpolation tends to result in an        oscillating delay contour. Thus pitch cycles in the modified        speech signal contract and expand periodically causing easily        annoying artifacts. The evolution and amplitude of the        oscillations are related to the last pitch position. The further        the last pitch pulse is from the frame end in relation to the        pitch period, the more likely the oscillations are amplified.        Since the time synchrony at the frame end is an essential        requirement of the illustrative embodiment of the signal        modification method according to the present invention, linear        interpolation familiar from the prior methods cannot be used        without degrading the speech quality. Instead, the illustrative        embodiment of the signal modification method according to the        present invention discloses a piecewise linear delay contour        $\begin{matrix}        {{d(t)} = \left\{ {\begin{matrix}        {{\left( {1 - {\alpha(t)}} \right)d_{n - 1}} + {{\alpha(t)}d_{n}}} & {t_{n - 1} < t < {t_{n - 1} + \sigma_{n}}} \\        d_{n} & {{t_{n - 1} + \sigma_{n}} \leq t \leq t_{n}}        \end{matrix}{where}} \right.} & (7) \\        {{\alpha(t)} = {\left( {t - t_{n - 1}} \right)/{\sigma_{n}.}}} & (8)        \end{matrix}$        Oscillations are significantly reduced by using this delay        contour. Here t_(n) and t_(n−1) are the end instants of the        current and previous frames, respectively, and d_(n) and d_(n−1)        are the corresponding delay parameter values. Note that        t_(n−1)+σ_(n) is the instant after which the delay contour        remains constant.

In an illustrative example, the parameter σ_(n) varies as a function ofd_(n−1) as $\begin{matrix}{\sigma_{n} = \left\{ \begin{matrix}{{172\quad{samples}},} & {d_{n - 1} \leq {90\quad{samples}}} \\{{128\quad{samples}},} & {d_{n - 1} > {90\quad{samples}}}\end{matrix} \right.} & (9)\end{matrix}$and the frame length N is 256 samples. To avoid oscillations, it isbeneficial to decrease the value of σ_(n) as the length of the pitchcycle increases. On the other hand, to avoid rapid changes in the delaycontour d(t) in the beginning of the frame as t_(n−1)<t<t_(n−1)+σ_(n),the parameter σ_(n) has to be always at least a half of the framelength. Rapid changes in d(t) degrade easily the quality of the modifiedspeech signal.

Note that depending on the coding mode of the previous frame, d_(n−1)can be either the delay value at the frame end (signal modificationenabled) or the delay value of the last subframe (signal modificationdisabled). Since the past value d_(n−1) of the delay parameter is knownat the decoder, the delay contour is unambiguously defined by d_(n), andthe decoder is able to form the delay contour using Equation (7).

The only parameter which can be varied while searching the optimal delaycontour is d_(n), the delay parameter value at the end of the frameconstrained into [34, 231]. There is no simple explicit method forsolving the optimal d_(n) in a general case. Instead, several valueshave to be tested to find the best solution. However, the search isstraightforward. The value of d_(n) can be first predicted as$\begin{matrix}{d_{n}^{(0)} = {{2\frac{T_{c} - T_{0}}{c}} - {d_{n - 1}.}}} & (10)\end{matrix}$In the illustrative embodiment embodiment, the search is done in threephases by increasing the resolution and focusing the search range to beexamined inside [34, 231] in every phase. The delay parameters givingthe smallest error e_(n)=|κ_(c)−T₀| in the procedure of Table 1 in thesethree phases are denoted by d_(n) ⁽¹⁾, d_(n) ⁽²⁾, and d_(n)=d_(n) ⁽³⁾,respectively. In the first phase, the search is done around the valued_(n) ⁽⁰⁾ predicted using Equation (10) with a resolution of foursamples in the range [d_(n) ⁽⁰⁾−11, d_(n) ⁽⁰⁾+12] when d_(n) ⁽⁰⁾<60, andin the range [d_(n) ⁽⁰⁾−15, d_(n) ⁽⁰⁾+16] otherwise. The second phaseconstrains the range into [d_(n) ⁽¹⁾−3, d_(n(1))+3] and uses the integerresolution. The last, third phase examines the range [d_(n) ⁽²⁾−¾, d_(n)⁽²⁾+¾] with a resolution of ¼ sample for d_(n) ⁽²⁾<92½. Above that range[d_(n) ⁽²⁾−½, d_(n) ⁽²⁾+½] and a resolution of ½ sample is used. Thisthird phase yields the optimal delay parameter d_(n) to be transmittedto the decoder. This procedure is a compromise between the searchaccuracy and complexity. Of course, those of ordinary skill in the artcan readily implement the search of the delay parameter under the timesynchrony constrains using alternative means without departing from thenature and spirit of the present invention.

The delay parameter d_(n)ε[34, 231] can be coded using nine bits perframe using a resolution of ¼ sample for d_(n)<921/2 and ½ sample ford_(n)>92½.

FIG. 8 illustrates delay interpolation when d_(n−1)=50, d_(n)=53,σ_(n)=172, and the frame length N=256. The interpolation method used inthe illustrative embodiment of the signal modification method is shownin thick line whereas the linear interpolation corresponding to priormethods is shown in thin line. Both interpolated contours performapproximately in a similar manner in the delay selection loop of Table1, but the disclosed piecewise linear interpolation results in a smallerabsolute change |d_(n−1)−d_(n)|. This feature reduces potentialoscillations in the delay contour d(t) and annoying artifacts in themodified speech signal whose pitch will follow this delay contour.

To further clarify the performance of the piecewise linear interpolationmethod, FIG. 9 shows an example on the resulting delay contour d(t) overten frames with thick line. The corresponding delay contour d(t)obtained with conventional linear interpolation is indicated with thinline. The example has been composed using an artificial speech signalhaving a constant delay parameter of 52 samples as an input of thespeech modification procedure. A delay parameter d₀=54 samples wasintentionally used as an initial value for the first frame to illustratethe effect of pitch estimation errors typical in speech coding. Then,the delay parameters d_(n) both for the linear interpolation and theherein disclosed piecewise linear interpolation method were searchedusing the procedure of Table 1. All the parameters needed were selectedin accordance with the illustrative embodiment of the signalmodification method according to the present invention. The resultingdelay contours d(t) show that piecewise linear interpolation yields arapidly converging delay contour d(t) whereas the conventional linearinterpolation cannot reach the correct value within the ten frameperiod. These prolonged oscillations in the delay contour d(t) oftencause annoying artifacts to the modified speech signal degrading theoverall perceptual quality.

Modification of the Signal

After the delay parameter d_(n) and the pitch cycle segmentation havebeen determined, the signal modification procedure itself can beinitiated. In the illustrative embodiment of the signal modificationmethod, the speech signal is modified by shifting individual pitch cyclesegments one by one adjusting them to the delay contour d(t). A segmentshift is determined by correlating the segment in the weighted speechdomain with the target signal. The target signal is composed using thesynthesized weighted speech signal {circumflex over (ω)}(t) of theprevious frame and the preceding, already shifted segments in thecurrent frame. The actual shift is done on the residual signal r(t).

Signal modification has to be done carefully to both maximize theperformance of long term prediction and simultaneously to preserve theperceptual quality of the modified speech signal. The required timesynchrony at frame boundaries has to be taken into account also duringmodification.

A block diagram of the illustrative embodiment of the signalmodification method is shown in FIG. 10. Modification starts byextracting a new segment w_(s)(k) of l_(s) samples from the weightedspeech signal w(t) in block 401. This segment is defined by the segmentlength l_(s) and starting instant t_(s) giving w_(s)(k)=w(t_(s)+k) fork=0, 1, . . . , l_(s)−1. The segmentation procedure is carried out inaccordance with the teachings of the foregoing description.

If no more segments can be selected or extracted (block 402), the signalmodification operation is completed (block 403). Otherwise, the signalmodification operation continues with block 404.

For finding the optimal shift of the current segment w_(s)(k), a targetsignal {tilde over (ω)}(t) is created in block 405. For the firstsegment w₁(k) in the current frame, this target signal is obtained bythe recursion{tilde over (ω)}(t)={circumflex over (ω)}(t), t≦t _(n−1){tilde over (ω)}(t)={tilde over (ω)}(t−d(t)), t _(n−1) <t<t _(n−1) +l₁+δ₁.   (11)Here {circumflex over (ω)}(t) is the weighted synthesized speech signalavailable in the previous frame for t≦t_(n−1). The parameter δ₁ is themaximum shift allowed for the first segment of length l₁. Equation (11)can be interpreted as simulation of long term prediction using the delaycontour over the signal portion in which the current shifted segment maypotentially be situated. The computation of the target signal for thesubsequent segments follows the same principle and will be presentedlater in this section.

The search procedure for finding the optimal shift of the currentsegment can be initiated after forming the target signal. This procedureis based on the correlation c_(s)(δ′) computed in block 404 between thesegment w_(s)(k) that starts at instant t_(s) and the target signal{tilde over (ω)}(t) as $\begin{matrix}{{{c_{s}\left( \delta^{\prime} \right)} = {\sum\limits_{k = 0}^{l_{x} - 1}{{w_{s}(k)}{\overset{\sim}{w}\left( {k + t_{s} + \delta^{\prime}} \right)}}}},\quad{\delta^{\prime} \in \left\lbrack {{- \left\lceil \delta_{s} \right\rceil},\left\lceil \delta_{s} \right\rceil} \right\rbrack},} & (12)\end{matrix}$where δ_(s) determines the maximum shift allowed for the current segmentw_(s)(k) and ┌•┐ denotes rounding towards plus infinity. Normalizedcorrelation can be well used instead of Equation (12), although withincreased complexity. In the illustrative embodiment, the followingvalues are used for δ_(s): $\begin{matrix}{\delta_{s} = \left\{ \begin{matrix}{{4\quad\frac{1}{2}\quad{samples}},} & {d_{n - 1} < {90\quad{samples}}} \\{{5\quad{samples}},} & {d_{n - 1} \geq {90\quad{samples}}}\end{matrix} \right.} & (13)\end{matrix}$As will be described later in this section, the value of δ_(s) is morelimited for the first and the last segment in the frame.

Correlation (12) is evaluated with an integer resolution, but higheraccuracy improves the performance of long term prediction. For keepingthe complexity low It is not reasonable to upsample directly the signalw_(s)(k) or {tilde over (ω)}(t) in Equation (12). Instead, a fractionalresolution is obtained in a computationally efficient manner bydetermining the optimal shift using the upsampled correlation c_(s)(δ′).

The shift δ maximizing the correlation c_(s) (δ′) is searched first inthe integer resolution in block 404. Now, in a fractional resolution themaximum value must be located in the open interval (δ−1, δ+1), andbounded into [−δ_(s), δ_(s)]. In block 406, the correlation c_(s)(δ′) isupsampled in this interval to a resolution of ⅛ sample usingHamming-windowed sinc interpolation of a length equal to 65 samples. Theshift δ corresponding to the maximum value of the upsampled correlationis then the optimal shift in a fractional resolution. After finding thisoptimal shift, the weighted speech segment w_(s)(k) is recalculated inthe solved fractional resolution in block 407. That is, the precise newstarting instant of the segment is updated as t_(s):=t_(s)−δ+δ_(l),where δ_(l)=┌δ┐. Further, the residual segment r_(s)(k) corresponding tothe weighted speech segment w_(s)(k) in fractional resolution iscomputed from the residual signal r(t) at this point using again thesinc interpolation as described before (block 407). Since the fractionalpart of the optimal shift is incorporated into the residual and weightedspeech segments, all subsequent computations can be implemented with theupward-rounded shift δ_(l)=┌δ┐.

FIG. 11 illustrates recalculation of the segment w_(s)(k) in accordancewith block 407 of FIG. 10. In this illustrative example, the optimalshift is searched with a resolution of 1/8 sample by maximizing thecorrelation giving the value δ=−1⅜. Thus the integer part δ_(l) becomes┌−1⅜=−1 and the fractional part ⅜. Consequently, the starting instant ofthe segment is updated as t_(s)=t_(s)+⅜. In FIG. 11, the new samples ofw_(s)(k) are indicated with gray dots.

If the logic block 106, which will be disclosed later, permits tocontinue signal modification, the final task is to update the modifiedresidual signal {haeck over (r)}(t) by copying the current residualsignal segment r_(s)(k) into it (block 411):{haeck over (r)}(t _(s)+δ_(l) +k)=r _(s)(k), k=0, 1, . . . l _(s)−1.  (14)Since shifts in successive segments are independent from each others,the segments positioned to {haeck over (r)}(t) either overlap or have agap in between them. Straightforward weighted averaging can be used foroverlapping segments. Gaps are filled by copying neighboring samplesfrom the adjacent segments. Since the number of overlapping or missingsamples is usually small and the segment boundaries occur at low-energyregions of the residual signal, usually no perceptual artifacts arecaused. It should be noted that no continuous signal warping asdescribed in [2], [6], [7],

-   -   [2] W. B. Kleijn, R. P. Ramachandran, and P. Kroon,        “Interpolation of the pitch-predictor parameters in        analysis-by-synthesis speech coders,” IEEE Transactions on        Speech and Audio Processing, Vol. 2, No. 1, pp. 42-54, 1994.    -   [6] Patent Application WO 00/11653, “Speech encoder with        continuous warping combined with long term prediction,” Conexant        Systems Inc., (Y. Gao), Filing Date 24 Aug. 1999.    -   [7] Patent Application WO 00/11654, “Speech encoder adaptively        applying pitch preprocessing with continuous warping,” Conexant        Systems Inc., (H. Su and Y. Gao), Filing Date 24 Aug. 1999.        is employed, but modification is done discontinuously by        shifting pitch cycle segments in order to reduce the complexity.

Processing of the subsequent pitch cycle segments follows theabove-disclosed procedure, except the target signal {tilde over (ω)}(t)in block 405 is formed differently than for the first segment. Thesamples of {tilde over (ω)}(t) are first replaced with the modifiedweighted speech samples as{tilde over (ω)}(t _(s)δ_(l) +k)=ω _(s)(k), K=0, 1, . . . , l _(s)=1.  (15)This procedure is illustrated in FIG. 11. Then the samples following theupdated segment are also updated,{tilde over (ω)}(k)={tilde over (ω)}(k−d(k)), k=t _(s)+δ₁ +l _(s), . . ., t_(s)δ₁ +l _(s+1) +δ _(s+1)−2.   (16)The update of target signal {tilde over (ω)}(t) ensures highercorrelation between successive pitch cycle segments in the modifiedspeech signal considering the delay contour d(t) and thus more accuratelong term prediction. While processing the last segment of the frame,the target signal {tilde over (ω)}(t) does not need to be updated.

The shifts of the first and the last segments in the frame are specialcases which have to be performed particularly carefully. Before shiftingthe first segment, it should be ensured that no high power regions existin the residual signal r(t) close to the frame boundary t_(n−1), becauseshifting such a segment may cause artifacts. The high power region issearched by squaring the residual signal r(t) asE ₀(k)=r ²(k), kε[t _(n−1)−ζ₀ , t _(n−1)+ζ₀,   (17)where ζ₀=<p(t_(n−1))/2). If the maximum of E₀(k) is detected close tothe frame boundary in the range [t_(n−1)−2, t_(n−1)+2], the allowedshift is limited to 1/4 samples. If the proposed shift |δ| for the firstsegment is smaller that this limit, the signal modification procedure isenabled in the current frame, but the first segment is kept intact.

The last segment in the frame is processed in a similar manner. As wasdescribed in the foregoing description, the delay contour d(t) isselected such that in principle no shifts are required for the lastsegment. However, because the target signal is repeatedly updated duringsignal modification considering correlations between successive segmentsin Equations (16) and (17), it is possible the last segment has to beshifted slightly. In the illustrative embodiment, this shift is alwaysconstrained to be smaller than 3/2 samples. If there is a high powerregion at the frame end, no shift is allowed. This condition is verifiedby using the squared residual signalE ₁(k)=r ²(k), kε[t _(n)−ζ₁+1, t _(n)+1],   (18)where ζ₁=p(t_(n)). If the maximum of E₁(k) is attained for k larger thanor equal to t_(n)−4, no shift is allowed for the last segment. Similarlyas for the first segment, when the proposed shift |δ|<¼, the presentframe is still accepted for modification, but the last segment is keptintact.

It should be noted that, contrary to the known signal modificationmethods, the shift does not translate to the next frame, and every newframe starts perfectly synchronized with the original input signal. Asanother fundamental difference particularly to RCELP coding, theillustrative embodiment of signal modification method processes acomplete speech frame before the subframes are coded. Admittedly,subframe-wise modification enables to compose the target signal forevery subframe using the previously coded subframe potentially improvingthe performance. This approach cannot be used in the context of theillustrative embodiment of the signal modification method since theallowed time asynchrony at the frame end is strictly constrained.Nevertheless, the update of the target signal with Equations (15) and(16) gives practically speaking equal performance with the subframe-wiseprocessing, because modification is enabled only on smoothly evolvingvoiced frames.

Mode Determination Logic Incorporated into the Signal ModificationProcedure

The illustrative embodiment of signal modification method according tothe present invention incorporates an efficient classification and modedetermination mechanism as depicted in FIG. 2. Every operation performedin blocks 101, 103 and 105 yields several indicators quantifying theattainable performance of long term prediction in the current frame. Ifany of these indicators is outside its allowed limits, the signalmodification procedure is terminated by one of the logic blocks 102,104, or 106. In this case, the original signal is preserved intact.

The pitch pulse search procedure 101 produces several indicators on theperiodicity of the present frame. Hence the logic block 102 analyzingthese indicators is the most important component of the classificationlogic. The logic block 102 compares the difference between the detectedpitch pulse positions and the interpolated open-loop pitch estimateusing the condition|T _(k) −T _(k−1) −p(T _(k))|<0.2 p(T _(k)), k=1,2, . . . , c,   (19)and terminates the signal modification procedure if this condition isnot met.

The selection of the delay contour d(t) in block 103 gives alsoadditional information on the evolution of the pitch cycles and theperiodicity of the current speech frame. This information is examined inthe logic block 104. The signal modification procedure is continued fromthis block 104 only if the condition |d_(n)−d_(n−1)<0.2 d_(n) isfulfilled. This condition means that only a small delay change istolerated for classifying the current frame as purely voiced frame. Thelogic block 104 also evaluates the success of the delay selection loopof Table 1 by examining the difference |κ_(c)−T₀| for the selected delayparameter value d_(n). If this difference is greater than one sample,the signal modification procedure is terminated.

For guaranteeing a good quality for the modified speech signal, it isadvantageous to constrain shifts done for successive pitch cyclesegments in block 105. This is achieved in the logic block 106 byimposing the criteria $\begin{matrix}{{{\delta^{(s)} - \delta_{r}^{({s - 1})}}} \leq \left\{ \begin{matrix}{{4.0\quad{samples}},} & {d_{n} < {90\quad{samples}}} \\{{4.8\quad{samples}},} & {d_{n} \geq {90\quad{samples}}}\end{matrix} \right.} & (20)\end{matrix}$to all segments of the frame. Here δ^((s)) and δ^((s−1)) are the shiftsdone for the s^(th) and (s−1)^(th) pitch cycle segments, respectively.If the thresholds are exceeded, the signal modification procedure Isinterrupted and the original signal is maintained.

When the frames subjected to signal modification are coded at a low bitrate, it is essential that the shape of pitch cycle segments remainssimilar over the frame. This allows faithful signal modeling by longterm prediction and thus coding at a low bit rate without degrading thesubjective quality. The similarity of successive segments can bequantified simply by the normalized correlation $\begin{matrix}{g_{s} = \frac{\sum\limits_{k = 0}^{l_{x} - 1}{{w_{s}(k)}{\overset{\sim}{w}\left( {k + t_{s} + \delta_{l}} \right)}}}{\sqrt{\sum\limits_{k = 0}^{l_{x} - 1}{{w^{2}(k)}{\sum\limits_{k = 0}^{l_{x} - 1}{w^{2}\left( {k + t_{s} + \delta_{l}} \right)}}}}}} & (21)\end{matrix}$between the current segment and the target signal at the optimal shiftafter the update of w_(s)(k) in block 407 of FIG. 10. The normalizedcorrelation g_(s) is also referred to as pitch gain.

Shifting of the pitch cycle segments in block 105 maximizing theircorrelation with the target signal enhances the periodicity and yields ahigh pitch prediction gain if the signal modification is useful In thecurrent frame. The success of the procedure is examined in the logicblock 106 using the criteriag_(s)>0.84.If this condition is not fulfilled for all segments, the signalmodification procedure is terminated (block 409) and the original signalis kept intact. When this condition is met (block 106), the signalmodification continues in block 411. The pitch gain g_(s) is computed inblock 408 between the recalculated segment w_(s)(k) from block 407 andthe target signal {tilde over (ω)}(t) from block 405. In general, aslightly lower gain threshold can be allowed on male voices With equalcoding performance. The gain thresholds can be changed in differentoperation modes of the encoder for adjusting the usage percentage of thesignal modification mode and thus the resulting average bit rate.

Mode Determination Logic for a Source-Controlled Variable Bit RateSpeech Codec

This section discloses the use of the signal modification procedure as apart of the general rate determination mechanism in a source-controlledvariable bit rate speech codec. This functionality is immersed into theillustrative embodiment of the signal modification method, since itprovides several indicators on signal periodicity and the expectedcoding performance of long term prediction in the present frame. Theseindicators include the evolution of pitch period, the fitness of theselected delay contour for describing this evolution, and the pitchprediction gain attainable with signal modification. If the logic blocks102, 104 and 106 shown in FIG. 2 enable signal modification, long termprediction is able to model the modified speech frame efficientlyfacilitating its coding at a low bit rate without degrading subjectivequality. In this case, the adaptive codebook excitation has a dominantcontribution in describing the excitation signal, and thus the bit rateallocated for the fixed-codebook excitation can be reduced. When a logicblock 102, 104 or 106 disables signal modification, the frame is likelyto contain an non-stationary speech segment such as a voiced onset orrapidly evolving voiced speech signal. These frames typically require ahigh bit rate for sustaining good subjective quality.

FIG. 12 depicts the signal modification procedure 603 as a part of therate determination logic that controls four coding modes. In thisillustrative embodiment, the mode set comprises a dedicated mode fornon-active speech frames (block 508), unvoiced speech frames (block507), stable voiced frames (block 506), and other types of frames (block505). It should be noted that all these modes except the mode for stablevoiced frames 506 are implemented in accordance with techniques wellknown to those of ordinary skill in the art.

The rate determination logic is based on signal classification done inthree steps in logic blocks 501, 502, and 504, from which the operationof blocks 501 and 502 is well known to those or ordinary skill in theart.

First, a voice activity detector (VAD) 501 discriminates between activeand inactive speech frames. If an inactive speech frame is detected, thespeech signal is processed according to mode 508.

If an active speech frame is detected in block 501, the frame issubjected to a second classifier 502 dedicated to making a voicingdecision. If the classifier 502 rates the current frame as unvoicedspeech signal, the classification chain ends and the speech signal isprocessed in accordance with mode 507. Otherwise, the speech frame ispassed through to the signal modification module 603.

The signal modification module then provides itself a decision onenabling or disabling the signal modification of the current frame in alogic block 504. This decision is in practice made as an integral partof the signal modification procedure in the logic blocks 102, 104 and106 as explained earlier with reference to FIG. 2. When signalmodification is enabled, the frame is deemed as a stable voiced, orpurely voiced speech segment.

When the rate determination mechanism selects mode 506, the signalmodification mode is enabled and the speech frame is encoded inaccordance with the teachings of the previous sections. Table 2discloses the bit allocation used in the illustrative embodiment for themode 506. Since the frames to be coded in this mode arecharacteristically very periodic, a substantially lower bit ratesuffices for sustaining good subjective quality compared for instance totransition frames. Signal modification allows also efficient coding ofthe delay information using only nine bits per 20-ms frame saving aconsiderable proportion of the bit budget for other parameters. Goodperformance of long term prediction allows to use only 13 bits per 5-mssubframe for the fixed-codebook excitation without sacrificing thesubjective speech quality. The fixed-codebook comprises one track withtwo pulses, both having 64 possible positions. TABLE 2 Bit allocation inthe voiced 6.2-kbps mode for a 20-ms frame comprising four subframes.Parameter Bits/Frame LP Parameters 34 Pitch Delay 9 Pitch Filtering 4 =1 + 1 + 1 + 1 Gains 24 = 6 + 6 + 6 + 6 Algebraic Codebook 52 = 13 + 13 +13 + 13 Mode Bit 1 Total 24 bits = 6.2-kbps

TABLE 3 Bit allocation in the 12.65-kbps mode in accordance with theAMR-WB standard. Parameter Bits/Frame LP Parameters 46 Pitch Delay 30 =9 + 6 + 9 + 6 Pitch Filtering 4 = 1 + 1 + 1 + 1 Gains 24 = 7 + 7 + 7 + 7Algebraic Codebook 144 = 36 + 36 + 36 + 36 Mode Bit 1 Total 253 bits =12.65 Kbps

The other coding modes 505, 507 and 508 are implemented following knowntechniques. Signal modification is disabled in all these modes. Table 3shows the bit allocation of the mode 505 adopted from the AMR-WBstandard.

The technical specifications [11] and [12] related to the AMR-WBstandard are enclosed here as references on the comfort noise and VADfunctionalities in 501 and 508, respectively:

-   -   [11] 3GPP TS 26,192, “AMR Wideband Speech Codec: Comfort Noise        Aspects,” 3GPP Technical Specification.    -   [12 ] 3GPP TS 26,193, “AMR Wideband Speech Codec: Voice Activity        Detector (VAD),” 3GPP Technical Specification.

In summary, the present specification has described a frame synchronoussignal modification method for purely voiced speech frames, aclassification mechanism for detecting frames to be modified, and to usethese methods in a source-controlled CELP speech codec in order toenable high-quality coding at a low bit rate.

The signal modification method incorporates a classification mechanismfor determining the frames to be modified. This differs from priorsignal modification and preprocessing means in operation and in theproperties of the modified signal. The classification functionalityembedded into the signal modification procedure is used as a part of therate determination mechanism in a source-controlled CELP speech codec.

Signal modification is done pitch and frame synchronously, that is,adapting one pitch cycle segment at a time in the current frame suchthat a subsequent speech frame starts in perfect time alignment with theoriginal signal. The pitch cycle segments are limited by frameboundaries. This feature prevents time shift translation over frameboundaries simplifying encoder implementation and reducing a risk ofartifacts in the modified speech signal. Since time shift does notaccumulate over successive frames, the signal modification methoddisclosed does not need long buffers for accommodating expanded signalsnor a complicated logic for controlling the accumulated time shift. Insource-controlled speech coding, it simplifies multi-mode operationbetween signal modification enabled and disabled modes, since every newframe starts in time alignment with the original signal.

Of course, many other modifications and variations are possible. In viewof the above detailed illustrative description of the present inventionand associated drawings, such other modifications and variations willnow become apparent to those of ordinary skill in the art. It shouldalso be apparent that such other variations may be effected withoutdeparting from the spirit and scope of the present invention.

1. A method for determining a long-term-prediction delay parametercharacterizing a long term prediction in a technique using signalmodification for digitally encoding a sound signal, comprising: dividingthe sound signal into a series of successive frames; locating a featureof the sound signal in a previous frame; locating a correspondingfeature of the sound signal in a current frame; and determining thelong-term-prediction delay parameter for the current frame such that thelong term prediction maps the signal feature of the previous frame tothe corresponding signal feature of the current frame.
 2. A method fordetermining a long-term-prediction delay parameter as defined in claim1, wherein determining the long-term-prediction delay parametercomprises: forming a delay contour from the long-term-prediction delayparameter.
 3. A method for determining a long-term-prediction delayparameter as defined in claim 2, wherein: the sound signal comprises aspeech signal; the feature of the speech signal in the previous framecomprises a pitch pulse of the speech signal in the previous frame; thefeature of the speech signal in the current frame comprises a pitchpulse of the speech signal in the current frame; and forming a delaycontour comprises mapping, with the long term prediction, the pitchpulse of the current frame to the pitch pulse of the previous frame. 4.A method for determining a long-term-prediction delay parameter asdefined in claim 3, wherein defining the long-term-prediction delayparameter comprises: calculating the long-term-prediction delayparameter as a function of distances of successive pitch pulses betweena last pitch pulse of the previous frame and a last pitch pulse of thecurrent frame.
 5. A method for determining a long-term-prediction delayparameter as defined in claim 2, further comprising: fullycharacterizing the delay contour with a long-term-prediction delayparameter of the previous frame and the long-term-prediction delayparameter of the current frame.
 6. A method for determining along-term-prediction delay parameter as defined in claim 2, whereinforming a delay contour comprises: nonlinearly interpolating the delaycontour between a long-term-prediction delay parameter of the previousframe and the long-term-prediction delay parameter of the current frame.7. A method for determining a long-term-prediction delay parameter asdefined in claim 2, wherein forming a delay contour comprises:determining a piecewise linear delay contour from a long-term-predictiondelay parameter of the previous frame and the long-term-prediction delayparameter of the current frame.
 8. A device for determining along-term-prediction delay parameter characterizing a long termprediction in a technique using signal modification for digitallyencoding a sound signal, comprising: a divider of the sound signal intoa series of successive frames; a detector of a feature of the soundsignal in a previous frame; a detector of a corresponding feature of thesound signal in a current frame; and a calculator of thelong-term-prediction delay parameter for the current frame, thecalculation of the long-term-prediction delay parameter being made suchthat the long term prediction maps the signal feature of the previousframe to the corresponding signal feature of the current frame.
 9. Adevice for determining a long-term-prediction delay parameter as definedin claim 8, wherein the calculator of the long-term-prediction delayparameter comprises: a selector of a delay contour from thelong-term-prediction delay parameter.
 10. A device for determining along-term-prediction delay parameter as defined in claim 9, wherein: thesound signal comprises a speech signal; the feature of the speech signalin the previous frame comprises a pitch pulse of the sound signal in theprevious frame; the feature of the speech signal in the current framecomprises a pitch pulse of the speech signal in the current frame; andthe delay contour selector is a selector of a delay contour mapping withthe long term prediction the pitch pulse of the current frame to thepitch pulse of the previous frame.
 11. A device for determining along-term-prediction delay parameter as defined in claim 10, wherein thelong-term-prediction delay parameter sub-calculator is: a calculator ofthe long-term-prediction delay parameter as a function of distances ofsuccessive pitch pulses between the last pitch pulse of the previousframe and the last pitch pulse of the current frame.
 12. A device fordetermining a long-term-prediction delay parameter as defined in claim9, further incorporating: a function fully characterizing the delaycontour with the long-term-prediction delay parameter of the previousframe and the long-term-prediction delay parameter of the current frame.13. A device for determining a long-term-prediction delay parameter asdefined in claim 9, wherein the delay contour selector is: a selector ofa nonlinearly interpolated delay contour between thelong-term-prediction delay parameter of the previous frame and thelong-term-prediction delay parameter of the current frame.
 14. A devicefor determining a long-term-prediction delay parameter as defined inclaim 9, wherein the delay contour selector is: a selector of apiecewise linear delay contour determined from the long-term-predictiondelay parameter of the previous frame and the long-term-prediction delayparameter of the current frame.
 15. A signal modification method forimplementation into a technique for digitally encoding a sound signal,comprising: dividing the sound signal into a series of successiveframes; partitioning each frame of the sound signal into a plurality ofsignal segments; and warping at least a part of the signal segments ofthe frame, said warping comprising constraining the warped signalsegments inside the frame.
 16. A signal modification method as definedin claim 15, wherein: the sound signal comprises pitch pulses; eachframe comprises boundaries; and partitioning each frame comprises:locating pitch pulses in the sound signal of the frame; dividing theframe into pitch cycle segments each containing one of the pitch pulsesand each located inside the boundaries of the frame.
 17. A signalmodification method as defined in claim 16, wherein: locating pitchpulses comprises using an open-loop pitch estimate Interpolated over theframe; and the signal modification method further comprises terminatinga signal modification procedure when a difference between positions ofthe located pitch pulses and the interpolated open-loop pitch estimatedoes not meet a given condition.
 18. A signal modification method asdefined in claim 15, wherein partitioning each frame of the sound signalinto a plurality of signal segments comprises: weighting the soundsignal to produce a weighted sound signal; and extracting the signalsegments from the weighted sound signal.
 19. A signal modificationmethod as defined in claim 15, wherein the warping comprises: producinga target signal for a current signal segment; and finding an optimalshift for the current signal segment in response to the target signal.20. A signal modification method as defined in claim 17, wherein:producing a target signal comprises producing a target signal from aweighted synthesized speech signal of a previous frame or from modifiedweighted speech signal; and finding an optimal shift for the currentsignal segment comprises performing a correlation between the currentsignal segment and the target signal.
 21. A signal modification methodas defined in claim 20, wherein performing a correlation comprises:first evaluating the correlation with an integer resolution to find asignal segment shift that maximizes the correlation; then upsampling thecorrelation in a region surrounding the correlation-maximizing signalsegment shift, said upsampling of the correlation comprising searchingan optimal shift of the current signal segment by maximizing thecorrelation with a fractional resolution.
 22. A signal modificationmethod as defined in claim 15, wherein: each frame comprises boundaries;warping at least a part of the signal segments of the frame comprises:detecting whether a high power region exists in the sound signal closeto the frame boundary adjacent to a signal segment; and shifting thesignal segment in relation to detection or absence of detection of ahigh power region.
 23. A signal modification method as defined in claim15, wherein the warping comprises: forming a delay contour defining aninterpolated long term prediction delay parameter over the current frameand providing additional information about the evolution of the pitchcycles and the periodicity of the current sound signal frame; andshifting the individual pitch cycle segments one by one to adjust themto the delay contour.
 24. A signal modification method as defined inclaim 23, wherein shifting the individual pitch cycle segmentscomprises: forming a target signal using the delay contour; and shiftingthe pitch cycle segment to maximize the correlation of said pitch cyclesegment with the target signal.
 25. A signal modification method asdefined in claim 23, further comprising: examining the information fromthe delay contour about the evolution of the pitch cycles and theperiodicity of the current sound signal frame; and defining at least onecondition related to the information given by the delay contour on theevolution of the pitch cycles and the periodicity of the current soundsignal frame; and interrupting the signal modification when said atleast one condition related to the information given by the delaycontour about the evolution of the pitch cycles and the periodicity ofthe current sound signal frame is not satisfied.
 26. A signalmodification method as defined in claim 19, further comprising:constraining the shift of the signal segments, said constrainingcomprising imposing a given criteria to all the signal segments of theframe; and interrupting the signal modification procedure when the givencriteria is not respected and maintaining the original sound signal. 27.A signal modification method as defined in claim 15, further comprising:detecting an absence of voice activity in the current frame of the soundsignal; and selecting a signal-modification-disabled mode of coding thecurrent frame of the sound signal in response to detection of theabsence of voice activity in the current frame.
 28. A signalmodification method as defined in claim 15, further comprising:detecting a presence of voice activity in the current frame of the soundsignal; rating the current frame as an unvoiced sound signal frame; andselecting a signal-modification-disabled mode of coding the currentframe of the sound signal in response to: detection of a presence ofvoice activity in the current frame of the sound signal; and rating thecurrent frame as an unvoiced sound signal frame.
 29. A signalmodification method as defined in claim 15, further comprising:detecting a presence of voice activity in the current frame of the soundsignal; rating the current frame as a voiced sound signal frame;detecting that signal modification is successful; and selecting asignal-modification-enabled mode of coding the current frame of thesound signal in response to: detection of a presence of voice activityin the current frame of the sound signal; rating the current frame as avoiced sound signal frame; and detection that the signal modification issuccessful.
 30. A signal modification method as defined in claim 15,further comprising: detecting a presence of voice activity in thecurrent frame of the sound signal; rating the current frame as a voicedsound signal frame; detecting that signal modification is notsuccessful; and selecting a signal-modification-disabled mode of codingthe current frame of the sound signal in response to: detection of apresence of voice activity in the current frame of the sound signal;rating the current frame as a voiced sound signal frame; and detectionthat signal modification is not successful.
 31. A signal modificationdevice for implementation into a technique for digitally encoding asound signal, comprising: a first divider of the sound signal into aseries of successive frames; a second divider of each frame of the soundsignal into a plurality of signal segments; and a signal segment warpingmember supplied with at least a part of the signal segments of theframe, said warping member comprising a constrainer of the warped signalsegments inside the frame.
 32. A signal modification device as definedin claim 31, wherein: the sound signal comprises pitch pulses; eachframe comprises boundaries; and the second divider comprises: a detectorof pitch pulses in the sound signal of the frame; a divider of the frameinto pitch cycle segments each containing one of the pitch pulses andeach located inside the boundaries of the frame.
 33. A signalmodification device as defined in claim 32, wherein: the detector ofpitch pulses uses an open-loop pitch estimate interpolated over theframe; and the signal modification device further comprises a signalmodification terminating member active when a difference betweenpositions of the detected pitch pulses and the interpolated open-looppitch estimate does not meet a given condition.
 34. A signalmodification device as defined in claim 31, wherein the second dividerof each frame of the sound signal into a plurality of signal segmentscomprises: a filter for weighting the sound signal to produce a weightedsound signal; and an extractor of the signal segments from the weightedsound signal.
 35. A signal modification device as defined in claim 31,wherein the signal segment warping member comprises: a calculator of atarget signal for a current signal segment; and a finder of an optimalshift for the current signal segment in response to the target signal.36. A signal modification device as defined in claim 35, wherein: thecalculator of a target signal is a calculator of a target signal from aweighted synthesized speech signal of a previous frame or from modifiedweighted speech signal; and the finder of an optimal shift for thecurrent signal segment comprises a calculator of a correlation betweenthe current signal segment and the target signal.
 37. A signalmodification device as defined in claim 36, wherein the calculator of acorrelation comprises: an evaluator of the correlation with an integerresolution to find a signal segment shift that maximizes thecorrelation; an upsampler of the correlation in a region surrounding thecorrelation-maximizing signal segment shift, said upsampler comprising asearcher of an optimal shift of the current signal segment, saidsearcher of an optimal shift of the current signal segment comprising anevaluator of the correlation with a fractional resolution.
 38. A signalmodification device as defined in claim 34, wherein: each framecomprises boundaries; the signal segment warping member comprises: adetector of whether a high power region exists in the sound signal closeto the frame boundary adjacent to a signal segment; and a shifter of thesignal segment in relation to detection or absence of detection of ahigh power region.
 39. A signal modification device as defined in claim31, wherein the signal segment warping member comprises: a calculator ofa delay contour defining an interpolated long term prediction delayparameter over the current frame and providing additional informationabout the evolution of the pitch cycles and the periodicity of thecurrent sound signal frame; and a shifter of the individual pitch cyclesegments one by one to adjust them to the delay contour.
 40. A signalmodification device as defined in claim 39, wherein the shifter of theindividual pitch cycle segments comprises: a calculator of a targetsignal using the delay contour; and a shifter of the pitch cycle segmentto maximize the correlation of said pitch cycle segment with the targetsignal.
 41. A signal modification device as defined in claim 40, furthercomprising: an evaluator of the information from the delay contour aboutthe evolution of the pitch cycles and the periodicity of the currentsound signal frame; and a definer of at least one condition related tothe information given by the delay contour about the evolution of thepitch cycles and the periodicity of the current sound signal frame; anda terminator of the signal modification when said at least one conditionrelated to the information given by the delay contour about theevolution of the pitch cycles and the periodicity of the current soundsignal frame is not satisfied.
 42. A signal modification device asdefined in claim 35, further comprising: a constrainer of the shift ofthe pitch cycle segments, said constrainer comprising an imposer of agiven criteria to all segments of the frame; and a terminator of thesignal modification procedure when the given criteria is not respected.43. A signal modification device as defined in claim 31, furthercomprising: a detector of an absence of voice activity in the currentframe of the sound signal; and a selector of asignal-modification-disabled mode of coding the current frame of thesound signal in response to detection of the absence of voice activityin the current frame.
 44. A signal modification device as defined inclaim 31, further comprising: a detector of a presence of voice activityin the current frame of the sound signal; a classifier for rating thecurrent frame as an unvoiced sound signal frame; and a selector of asignal-modification-disabled mode of coding the current frame of thesound signal in response to detection of a presence of voice activity inthe current frame of the sound signal; and rating the current frame asan unvoiced sound signal frame.
 45. A signal modification device asdefined in claim 31, further comprising: a detector of a presence ofvoice activity in the current frame of the sound signal; a classifierfor rating the current frame as a voiced sound signal frame; a detectorthat signal modification is successful; and a selector of asignal-modification-enabled mode of coding the current frame of thesound signal in response to: detection of a presence of voice activityin the current frame of the sound signal; rating the current frame as avoiced sound signal frame; and detection that signal modification issuccessful.
 46. A signal modification device as defined in claim 31,further comprising: a detector of a presence of voice activity in thecurrent frame of the sound signal; a classifier for rating the currentframe as a voiced sound signal frame; a detector that signalmodification is not successful; and a selector of asignal-modification-disabled mode of coding the current frame of thesound signal in response to: detection of a presence of voice activityin the current frame of the sound signal; rating the current frame as avoiced sound signal frame; and detection that signal modification is notsuccessful.
 47. A method for searching pitch pulses in a sound signal,comprising: dividing the sound signal into a series of successiveframes; dividing each frame into a number of subframes; producing aresidual signal by filtering the sound signal through a linearprediction analysis filter; locating a last pitch pulse of the soundsignal of the previous frame from the residual signal; extracting apitch pulse prototype of given length around the position of the lastpitch pulse of the previous frame using the residual signal; andlocating pitch pulses in a current frame using the pitch pulseprototype.
 48. A method for searching pitch pulses in a sound signal asdefined in claim 47, further comprising: predicting the position of afirst pitch pulse of the current frame to occur at an instant related tothe position of the previously located pitch pulse and an interpolatedopen-loop pitch estimate at an instant corresponding to the position ofthe previously located pitch pulse; and refining the predicted positionof said pitch pulse by maximizing a weighted correlation between thepulse prototype and the residual signal.
 49. A method for searchingpitch pulses in a sound signal as defined in claim 48, furthercomprising: repeating the prediction of pitch pulse position and therefinement of predicted position until said prediction and refinementyields a pitch pulse position located outside the current frame.
 50. Adevice for searching pitch pulses in a sound signal, comprising: adivider of the sound signal into a series of successive frames; adivider of each frame into a number of subframes; a linear predictionanalysis filter for filtering the sound signal and thereby producing aresidual signal; a detector of a last pitch pulse of the sound signal ofthe previous frame in response to the residual signal; an extractor of apitch pulse prototype of given length around the position of the lastpitch pulse of the previous frame in response to the residual signal;and a detector of pitch pulses in a current frame using the pitch pulseprototype.
 51. A device for searching pitch pulses in a sound signal asdefined in claim 50, further comprising: a predictor of the position ofeach pitch pulse of the current frame to occur at an instant related tothe position of the previous located pitch pulse and an interpolatedopen-loop pitch estimate at said instant corresponding to the positionof the previously located pitch pulse; and a refiner of the predictedposition of said pitch pulse by maximizing a weighted correlationbetween the pulse prototype and the residual signal.
 52. A device forsearching pitch pulses in a sound signal as defined in claim 51, furthercomprising: a repeater of the prediction of pitch pulse position and therefinement of predicted position until said prediction and refinementyields a pitch pulse position located outside the current frame.
 53. Amethod for searching pitch pulses in a sound signal, comprising:dividing the sound signal into a series of successive frames; dividingeach frame into a number of subframes; producing a weighted sound signalby processing the sound signal through a weighting filter, the weightedsound signal being indicative of signal periodicity; locating a lastpitch pulse of the sound signal of the previous frame from the weightedsound signal; extracting a pitch pulse prototype of given length aroundthe position of the last pitch pulse of the previous frame using theweighted sound signal; and locating pitch pulses in a current frameusing the pitch pulse prototype.
 54. A method for searching pitch pulsesin a sound signal as defined in claim 53, further comprising: predictingthe position of a first pitch pulse of the current frame to occur at aninstant related to the position of the previously located pitch pulseand an interpolated open-loop pitch estimate at an instant correspondingto the position of the previously located pitch pulse; and refining thepredicted position of said pitch pulse by maximizing a weightedcorrelation between the pulse prototype and the weighted sound signal.55. A method for searching pitch pulses in a sound signal as defined inclaim 54, further comprising: repeating the prediction of pitch pulseposition and the refinement of predicted position until said predictionand refinement yields a pitch pulse position located outside the currentframe.
 56. A device for searching pitch pulses in a sound signal,comprising: a divider of the sound signal into a series of successiveframes; a divider of each frame into a number of subframes; a weightingfilter for processing the sound signal to produce a weighted soundsignal, the weighted sound signal being indicative of signalperiodicity; a detector of a last pitch pulse of the sound signal of theprevious frame in response to the weighted sound signal; an extractor ofa pitch pulse prototype of given length around the position of the lastpitch pulse of the previous frame in response to the weighted soundsignal, and a detector of pitch pulses in a current frame using thepitch pulse prototype.
 57. A device for searching pitch pulses in asound signal as defined in claim 56, further comprising: a predictor ofthe position of each pitch pulse of the current frame to occur at aninstant related to the position of the previous located pitch pulse andan interpolated open-loop pitch estimate at said instant correspondingto the position of the previously located pitch pulse; and a refiner ofthe predicted position of said pitch pulse by maximizing a weightedcorrelation between the pulse prototype and the weighted sound signal.58. A device for searching pitch pulses in a sound signal as defined inclaim 57, further comprising: a repeater of the prediction of pitchpulse position and the refinement of predicted position until saidprediction and refinement yields a pitch pulse position located outsidethe current frame.
 59. A method for searching pitch pulses in a soundsignal, comprising: dividing the sound signal into a series ofsuccessive frames; dividing each frame into a number of subframes;producing a synthesized weighted sound signal by filtering a synthesizedspeech signal produced during a last subframe of a previous frame of thesound signal through a weighting filter; locating a last pitch pulse ofthe sound signal of the previous frame from the synthesized weightedsound signal; extracting a pitch pulse prototype of given length aroundthe position of the last pitch pulse of the previous frame using thesynthesized weighted sound signal; and locating pitch pulses in acurrent frame using the pitch pulse prototype.
 60. A method forsearching pitch pulses in a sound signal as defined in claim 59, furthercomprising: predicting the position of a first pitch pulse of thecurrent frame to occur at an instant related to the position of thepreviously located pitch pulse and an interpolated open-loop pitchestimate at an instant corresponding to the position of the previouslylocated pitch pulse; and refining the predicted position of said pitchpulse by maximizing a weighted correlation between the pulse prototypeand the synthesized weighted sound signal.
 61. A method for searchingpitch pulses in a sound signal as defined in claim 60, furthercomprising: repeating the prediction of pitch pulse position and therefinement of predicted position until said prediction and refinementyields a pitch pulse position located outside the current frame.
 62. Adevice for searching pitch pulses in a sound signal, comprising: adivider of the sound signal into a series of successive frames; adivider of each frame into a number of subframes; a weighting filter forfiltering a synthesized speech signal produced during a last subframe ofa previous frame of the sound signal and thereby producing a synthesizedweighted sound signal; a detector of a last pitch pulse of the soundsignal of the previous frame in response to the synthesized weightedsound signal; an extractor of a pitch pulse prototype of given lengtharound the position of the last pitch pulse of the previous frame inresponse to the synthesized weighted sound signal; and a detector ofpitch pulses in a current frame using the pitch pulse prototype.
 63. Adevice for searching pitch pulses in a sound signal as defined in claim62, further comprising: a predictor of the position of each pitch pulseof the current frame to occur at an instant related to the position ofthe previous located pitch pulse and an interpolated open-loop pitchestimate at said instant corresponding to the position of the previouslylocated pitch pulse; and a refiner of the predicted position of saidpitch pulse by maximizing a weighted correlation between the pulseprototype and the synthesized weighted sound signal.
 64. A device forsearching pitch pulses in a sound signal as defined in claim 63, furthercomprising: a repeater of the prediction of pitch pulse position and therefinement of predicted position until said prediction and refinementyields a pitch pulse position located outside the current frame.
 65. Amethod for forming an adaptive codebook excitation during decoding of asound signal divided into successive frames and previously encoded bymeans of a technique using signal modification for digitally encodingthe sound signal, comprising: receiving, for each frame, along-term-prediction delay parameter characterizing a long termprediction in the digital sound signal encoding technique; recovering adelay contour using the long-term-prediction delay parameter receivedduring a current frame and the long-term-prediction delay parameterreceived during a previous frame, wherein the delay contour maps, withlong term prediction, a signal feature of the previous frame to acorresponding signal feature of the current frame; forming the adaptivecodebook excitation in an adaptive codebook in response to the delaycontour.
 66. A device for forming an adaptive codebook excitation duringdecoding of a sound signal divided into successive frames and previouslyencoded by means of a technique using signal modification for digitallyencoding the sound signal, comprising: a receiver of along-term-prediction delay parameter of each frame, wherein thelong-term-prediction delay parameter characterizes a long termprediction in the digital sound signal encoding technique; a calculatorof a delay contour in response to the long-term-prediction delayparameter received during a current frame and the long-term-predictiondelay parameter received during a previous frame, wherein the delaycontour maps, with long term prediction, a signal feature of theprevious frame to a corresponding signal feature of the current frame;and an adaptive codebook for forming the adaptive codebook excitation inresponse to the delay contour.